Combating Web Spam with TrustRank

نویسندگان

  • Zoltán Gyöngyi
  • Hector Garcia-Molina
  • Jan O. Pedersen
چکیده

Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine’s results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages from spam. We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that are likely to be good. In this paper we discuss possible ways to implement the seed selection and the discovery of good pages. We present results of experiments run on the World Wide Web indexed by AltaVista and evaluate the performance of our techniques. Our results show that we can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Asynchronous Anti-TrustRank for Web Spam Detection

Web spam detection is an important problem in Web search. Since Web spam pages tend to have a lot of spurious links, many Web spam detection algorithms exploit the hyperlink structure between the Web pages to detect the spam pages. Anti-TrustRank algorithm is a well-known link-based spam detection algorithm which follows the principle that spam pages are likely to be referenced by other spam pa...

متن کامل

Link-Based Characterization and Detection of Web Spam

We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. ...

متن کامل

Link-Based Spam Algorithms in Adversarial Information Retrieval

Web spam has become one of the most exciting challenges and threats to Web search engines. The relationship between the search systems and those who try to manipulate them came up with the field of adversarial information retrieval. In this paper, we have set up several experiments to compare HostRank and TrustRank to show how effective it is for TrustRank to combat Web spam and we have also re...

متن کامل

Propagating Trust and Distrust to Demote Web Spam

Web spamming describes behavior that attempts to deceive search engine’s ranking algorithms. TrustRank is a recent algorithm that can combat web spam by propagating trust among web pages. However, TrustRank propagates trust among web pages based on the number of outgoing links, which is also how PageRank propagates authority scores among Web pages. This type of propagation may be suited for pro...

متن کامل

SIGIR 2006 Workshop on Adversarial Information Retrieval on the Web AIRWeb 2006

We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004